Fix ASAN build clobbering packages after build on the same system#13
Fix ASAN build clobbering packages after build on the same system#13
Conversation
Signed-off-by: Connor Roos <croos@nvidia.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the SONiC build system to better support AddressSanitizer (ASAN) builds alongside regular builds by introducing ASAN-suffixed artifact naming and adjusting how debs/docker images are produced and cached.
Changes:
- Rename key deb/docker outputs for ASAN builds (e.g.,
swss-asan,syncd-asan,docker-orchagent-asan,docker-syncd-*-asan) to avoid collisions with regular build artifacts. - Teach the dpkg deb move step to support renaming via
*_DPKG_DEB_NAME, enabling “target name != dpkg output filename”. - Update several
*_DEP_FLAGSdefinitions to drop$(ENABLE_ASAN)so dependency/cache keys align with the new artifact naming approach.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
slave.mk |
Adds deb renaming support via *_DPKG_DEB_NAME during artifact move; exports ENABLE_ASAN for docker Jinja rendering. |
rules/sysmgr.dep |
Removes $(ENABLE_ASAN) from sysmgr dep flags. |
rules/syncd.mk |
Introduces ASAN-suffixed syncd deb names and dpkg output name mapping. |
rules/syncd.dep |
Removes $(ENABLE_ASAN) from syncd dep flags. |
rules/swss.mk |
Introduces ASAN-suffixed swss deb names and dpkg output name mapping. |
rules/swss.dep |
Removes $(ENABLE_ASAN) from swss dep flags. |
rules/docker-orchagent.mk |
Adds ASAN-suffixed orchagent docker image name and extra deps when ASAN enabled. |
rules/docker-orchagent.dep |
Removes $(ENABLE_ASAN) from orchagent docker dep flags. |
platform/template/docker-syncd-bookworm.mk |
Adds ASAN-suffixed syncd docker base image naming. |
platform/mellanox/docker-syncd-mlnx.mk |
Adds extra dbg deps for ASAN syncd docker image. |
platform/mellanox/docker-syncd-mlnx.dep |
Removes $(ENABLE_ASAN) from Mellanox syncd docker dep flags. |
|
I suppose this also must be changed, we need to add extra package here swss-asan, syncd-asan target https://github.com/sonic-net/sonic-swss/blob/master/debian/control https://github.com/sonic-net/sonic-sairedis/blob/master/debian/control |
|
@copilot make the change vivek asked for |
Signed-off-by: Connor Roos <croos@nvidia.com>
Signed-off-by: Connor Roos <croos@nvidia.com>
Signed-off-by: Connor Roos <croos@nvidia.com>
Signed-off-by: Connor Roos <croos@nvidia.com>
Signed-off-by: Connor Roos <croos@nvidia.com>
Signed-off-by: Connor Roos <croos@nvidia.com>
|
LGTM, do you have sonic-swss and sonic-sairedis PR's? |
Added them to the description |
…net#25643) * [build] Add build timing report and dependency analysis tools Add three scripts for build performance instrumentation: - scripts/build-timing-report.sh: Parse per-package timing from build logs (HEADER/FOOTER timestamps), generate sorted duration table, phase breakdown, parallelism timeline, and CSV export. - scripts/build-dep-graph.py: Parse rules/*.mk dependency graph, compute critical path, fan-out/fan-in bottleneck analysis, and generate DOT/JSON output for visualization. - scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O, and Docker container count during builds for resource utilization analysis. Add "make build-report" target to slave.mk that runs the timing report and dependency analysis after a build completes. Example output from a VS build on 24-core/30GB machine: - 210 packages built in 53m wall time (173m CPU) - Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4) - Critical path: 14 packages deep (libnl -> libswsscommon -> utilities) - Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com> * Address Copilot review: fix 17 bugs in build analysis scripts - Use free -m with division instead of free -g to avoid rounding (#1) - Add = and ?= to Makefile dependency regex patterns (#2, #7) - CPU calculation now uses /proc/stat delta (two reads) (#3, sonic-net#14) - Fix misleading 'critical path estimate' comment (#4) - Fix parallelism timeline comment (60s not 10s) (#5) - Include after-relationship packages in fan stats (#6) - Guard disk I/O division by zero when INTERVAL<=1 (#8) - Remove unused elapsed_line variable (#9) - Remove redundant LIBSWSSCOMMON_DBG check (#10) - Remove active_make_jobs from CSV header comment (#11) - Wire up _RDEPENDS parsing to build reverse deps (#12) - Remove unnecessary 'if v' filter on rdeps JSON (#13) - Remove unused REPORT_FORMAT parameter (sonic-net#15) - Add cycle detection to critical path algorithm (sonic-net#16) - Add execute permission check for companion scripts (sonic-net#17) Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com> --------- Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com> Co-authored-by: Rustiqly <rustiqly@users.noreply.github.com>
…dating udevd rules (sonic-net#26343) - Why I did it On SONiC SmartSwitch platforms with DPUs, systemd-udevd crashes with SIGABRT on every reboot when DPU firmware initialization is slow. During the initramfs boot phase, a standalone systemd-udevd daemon is started to handle device discovery. If DPU firmware takes longer than the 60-second udevadm settle timeout (BlueField-3 DPUs can take 120 seconds each in the failure case when they are stuck), the initramfs cannot stop this udevd before switch_root. The stale process survives into the real system but is never chrooted into the overlayfs root, leaving it with a broken filesystem view. When dpu-udev-manager.sh writes udev rules, the stale udevd detects the change and crashes on an assertion in systemd's chase() path resolution (assert(path_is_absolute(p)) at chase.c:648), because dir_fd_is_root() returns false for a process whose root still points to the initramfs rootfs rather than the overlayfs. This triggers a systemd issue : systemd/systemd#29559 which maintainers doesn't consider as a bug from systemd side. Raising this fix for our usecase. Core was generated by `/usr/lib/systemd/systemd-udevd --daemon --resolve-names=never'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f29fe7a1cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f29fe78a4ac in abort () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007f29fea50c11 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #4 0x00007f29feb94a8b in chase () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #5 0x00007f29feb956e2 in chase_and_opendir () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #6 0x00007f29feb9a609 in conf_files_list_strv () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #7 0x00007f29fea913e8 in config_get_stats_by_path () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #8 0x0000559f295519cf in ?? () #9 0x0000559f29553a77 in ?? () #10 0x00007f29fec36055 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #11 0x00007f29fec3668d in sd_event_dispatch () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #12 0x00007f29fec394a8 in sd_event_run () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #13 0x00007f29fec396c7 in sd_event_loop () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so sonic-net#14 0x0000559f29545820 in ?? () sonic-net#15 0x00007f29fe78bca8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 sonic-net#16 0x00007f29fe78bd65 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 sonic-net#17 0x0000559f29545c51 in ?? () - How I did it Added a kill_stale_udevd() function to dpu-udev-manager.sh that runs before writing the udev rules. It identifies the systemd-managed udevd PID via systemctl show, then kills any other systemd-udevd --daemon process that doesn't match -- these are leftover initramfs instances. If no stale process exists (e.g. DPUs are healthy and the initramfs udevd exited cleanly), the function is a no-op. - How to verify it Deploy the image on a SmartSwitch with DPUs in a state where firmware initialization times out (>60s per DPU) by stopping image installation before firmware install step Reboot the switch Verify no new systemd-udevd coredumps in /var/core/ Verify the stale process was killed: journalctl -b 0 | grep dpu-udev-manager should show killing stale initramfs udevd PID (systemd udevd is PID ) Verify systemd-udevd.service is healthy: systemctl status systemd-udevd should show active (running) Verify DPU udev rules were written: cat /etc/udev/rules.d/92-midplane-intf.rules should contain the DPU interface naming rules Signed-off-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
Depends on:
croos12/sonic-sairedis#10
croos12/sonic-swss#6
Why I did it
ASAN-instrumented deb packages and Docker images had the same filenames as regular builds, making it impossible to build ASAN on top of a regular build without clobbering and reusing artifacts, such as swss deb and syncd deb that need to be rebuilt for ASAN. Additionally, ENABLE_ASAN was not exported during Docker image builds, so ASAN_OPTIONS were never set in supervisord.conf at runtime.
How I did it
How to verify it
Verify ASAN is active at runtime:
Tested branch (Please provide the tested image version)
Description for the changelog
Fix ASAN build clobbering packages after build on the same system